Multiple Character Regular Expressions

Single character regular expressions can be combined into more complex regular expressions that match more than one character at a time. You can also apply modifiers to single character regular expressions to make them match more than one instance of the character or characters specified by the single character regular expression.

Use the following rules to construct regular expressions from single character regular expressions:

    A single character regular expression followed by an asterisk (*) is a regular expression that matches zero or more consecutive occurrences of the single character regular expression; always as many as possible. For example,  f*  matches zero or more consecutive  f  characters.

    A single character regular expression followed by \{ m\}, \{ m,\}, or \{ m,n\} is a regular expression that matches a specified number of occurrences of the single character regular expression. The values of m and n must be nonnegative integers less than 255; \{ m\} matches exactly m occurrences of the single character regular expression; \{ m,\} matches at least m occurrences; \{ m,n\} matches any number of occurrences between m and n, inclusive. Whenever a choice exists, the regular expression matches as many occurrences as possible. For example, the regular expression  [0-9]\{ 1,4\}  matches any sequence of 1 to 4 digits.

    A concatenation of regular expressions is a regular expression that matches the concatenation of the strings matched by each component of the regular expression. For example, the regular expression  [0-9][a-z]  matches any string of two characters that starts with a digit and ends with a lowercase letter.

    A regular expression enclosed between the character sequences \( and \) is a sub-expression that matches whatever the original regular expression matches. The sub-expression is placed in a internal, numbered register for use later. The registers are numbered according to the pairs of \( and \) within the whole regular expression. The first pair corresponds to register 1. For example, if a string matches the regular expression  \([0-9]*\)\([a-z]*\) , register 1 will contain whatever sequence of characters matched the regular expression  [0-9]* , and register 2 will contain whatever sequence of characters matched the regular expression  [a-z]* . There may be up to 9 sub-expressions in a regular expression.

    The expression \n, where n is a digit from 1 through 9, matches the same string of characters stored in the internal register number n; that is, the same string of characters that the sub-expression corresponding to n originally matched. Regular expressions of the form \n are meaningless unless there is a sub-expression corresponding to n. For example, the regular expression  \([0-9]\)\1\1  matches any string which consists of the same digit three times, such as  777 .

    A regular expression preceeded by a caret (^) must match at the beginning of the target string. A regular expression followed by a dollar sign ($) must match at the end of the target string. Both may be used in the same regular expression. For example, the regular expression  ^[0-9]*$  only matches strings that consist entirely of digits.